execution latency
MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers
King, Tobias, Zhou, Yexu, Röddiger, Tobias, Beigl, Michael
Designing domain specific neural networks is a time-consuming, error-prone, and expensive task. Neural Architecture Search (NAS) exists to simplify domain-specific model development but there is a gap in the literature for time series classification on microcontrollers. Therefore, we adapt the concept of differentiable neural architecture search (DNAS) to solve the time-series classification problem on resource-constrained microcontrollers (MCUs). We introduce MicroNAS, a domain-specific HW-NAS system integration of DNAS, Latency Lookup Tables, dynamic convolutions and a novel search space specifically designed for time-series classification on MCUs. The resulting system is hardware-aware and can generate neural network architectures that satisfy user-defined limits on the execution latency and peak memory consumption. Our extensive studies on different MCUs and standard benchmark datasets demonstrate that MicroNAS finds MCU-tailored architectures that achieve performance (F1-score) near to state-of-the-art desktop models. We also show that our approach is superior in adhering to memory and latency constraints compared to domain-independent NAS baselines such as DARTS.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (3 more...)
Congestion-aware Distributed Task Offloading in Wireless Multi-hop Networks Using Graph Neural Networks
Zhao, Zhongyuan, Perazzone, Jake, Verma, Gunjan, Segarra, Santiago
ABSTRACT Computational offloading has become an enabling component for edge intelligence in mobile and smart devices. To fill this gap, we propose a low-overhead, congestion-aware distributed task Figure 1: Challenges in distributed multi-hop offloading: (a) probing: offloading scheme by augmenting a distributed greedy framework nodes 1 and 2 query the communication and computing bandwidth with graph-based machine learning. For offloading in wireless multi-hop networks [17-22], a centralized scheduler with global knowledge of 1. INTRODUCTION However, centralized multihop The proliferation of mobile and smart devices enables the collection offloading has the drawbacks of single-point-of-failure and poor of rich sensory data from both physical and cyber spaces, leading to scalability, due to the high communication overhead of collecting the many exciting applications, such as connected vehicles, drone/robot full network state to a dedicated scheduler. Distributed multi-hop offloading swarms, software-defined networks (SDN), and Internet-of-Things based on pricing [18,21] and learning [22] only focus on the (IoT) [1-4]. To support these applications, wireless multi-hop networks, capacity of servers, while ignoring the potential network congestion which have been traditionally used for military communications, caused by offloading [19], as illustrated by the motivating example disaster relief, and sensor networks, are now envisioned in Figure 1.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- Information Technology (0.86)
- Government (0.68)
FOSS: A Self-Learned Doctor for Query Optimizer
Zhong, Kai, Sun, Luming, Ji, Tao, Li, Cuiping, Chen, Hong
Various works have utilized deep reinforcement learning (DRL) to address the query optimization problem in database system. They either learn to construct plans from scratch in a bottom-up manner or guide the plan generation behavior of traditional optimizer using hints. While these methods have achieved some success, they face challenges in either low training efficiency or limited plan search space. To address these challenges, we introduce FOSS, a novel DRL-based framework for query optimization. FOSS initiates optimization from the original plan generated by a traditional optimizer and incrementally refines suboptimal nodes of the plan through a sequence of actions. Additionally, we devise an asymmetric advantage model to evaluate the advantage between two plans. We integrate it with a traditional optimizer to form a simulated environment. Leveraging this simulated environment, FOSS can bootstrap itself to rapidly generate a large amount of high-quality simulated experiences. FOSS then learns and improves its optimization capability from these simulated experiences. We evaluate the performance of FOSS on Join Order Benchmark, TPC-DS, and Stack Overflow. The experimental results demonstrate that FOSS outperforms the state-of-the-art methods in terms of latency performance and optimization time. Compared to PostgreSQL, FOSS achieves savings ranging from 15% to 83% in total latency across different benchmarks.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Workflow (1.00)
- Research Report > New Finding (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.86)
Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge
Rutishauser, Georg, Conti, Francesco, Benini, Luca
Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Fast Distributed Multi-agent Plan Execution with Dynamic Task Assignment and Scheduling
Shah, Julie A. (Massachusetts Institute of Technology) | Conrad, Patrick R. (Massachusetts Institute of Technology) | Williams, Brian C. (Massachusetts Institute of Technology)
An essential quality of a good partner is her responsiveness to other team members. Recent work in dynamic plan execution exhibits elements of this quality through the ability to adapt to the temporal uncertainties of others agents and the environment. However, a good teammate also has the ability to adapt on-the-fly through task assignment. We generalize the framework of dynamic execution to perform plan execution with dynamic task assignment as well as scheduling. This paper introduces Chaski, a multi-agent executive for scheduling temporal plans with online task assignment. Chaski enables an agent to dynamically update its plan in response to disturbances in task assignment and the schedule of other agents. The agent then uses the updated plan to choose, schedule and execute actions that are guaranteed to be temporally consistent and logically valid within the multi-agent plan. Chaski is made efficient through an incremental algorithm that compactly encodes all scheduling policies for all possible task assignments. We apply Chaski to perform multi-manipulator coordination using two Barrett Arms within the authors' hardware testbed. We empirically demonstrate up to one order of magnitude improvements in execution latency and solution compactness compared to prior art.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- (2 more...)